This describes the process of building a new monitoring server from scratch. This was modeled on the HamWAN server running in our Fremont site, but build on a new vintage VM and with more modern software versions. Rather than Cacti, we use Zabbix as the monitoring engine.
Added Debian bookworm net-install ISO to proxmox server
Create a VM, and install Debian bookworm 16GB disk with LVM guided install with separate home, var, and tmp 100GB disk unconfigured for monitoring data
Boot and configure as a minimal server using the graphical interface hostname: monitoring domain: ziply.hamwan.net (root password in Vaultwarden)
apt install mg sudo pylint python3-virtualenv strace locate mtr rsyslog postfix redis-server redis-tools updatedb # update the locate database change /etc/ssh/sshd_config to move server to port 222 /sbin/groupadd hamadmin add /etc/sudoers.d/hamadmin (copied from monitoring.hamwan.net)
My initial account had to be lower case (so kd7dk). I then fixed that to my standard HamWAN KD7DK account:
Note: at this point I believe our ansible automation is capable of creating all the other netop accounts.
Plumb the management LAN to an interface.
Add routing for net 10.44.0.0/16 via mgmt LAN router. We need this to handle routing to both public and management networks without routing between them and maintaining some redundancy that we would lose with a default route. Key changes are in frr.conf and daemons to support running 2 OSPF instances.
# default to using syslog. /etc/rsyslog.d/45-frr.conf places the log in
# /var/log/frr/frr.log
#
# Note:
# FRR's configuration shell, vtysh, dynamically edits the live, in-memory
# configuration while FRR is running. When instructed, vtysh will persist the
# live configuration to this file, overwriting its contents. If you want to
# avoid this, you can edit this file manually before starting FRR, or instruct
# vtysh to write configuration to a different file.
log syslog informational
frr defaults traditional
password <PASSWORD>
enable password <PASSWORD>
log file /var/log/frr/frr.log
interface ens18
ip ospf 1 area 0
ip ospf authentication message-digest
ip ospf message-digest-key 1 md5 <OSPF_PASSWORD>
ip ospf priority 10
interface ens19
ip ospf 2 area 0
ip ospf priority 10
interface lo
router ospf 1
ospf router-id 44.25.67.58
redistribute connected
distribute-list AMPR out connected
network 44.25.67.0/26 area 0
network 44.25.0.0/23 area 0
area 0 authentication message-digest
router ospf 2
ospf router-id 10.44.4.8
redistribute connected
distribute-list MGMT out connected
network 10.44.4.0/24 area 0
network 10.44.200.0/23 area 0
access-list AMPR permit 44.0.0.0/9
access-list AMPR permit 44.128.0.0/10
access-list MGMT permit 10.44.0.0/16
Install git.
apt install git
install Docker according to https://docs.docker.com/engine/install/debian/ from https://github.com/zabbix/zabbix-docker/tree/7.2 Follow model docker-compose_v3_ubuntu_mysql_latest.yaml
Add the following to /etc/fstab: /dev/mapper/monitoring–data–vg-data /data ext4 defaults 0 2 followed by: systemctl daemon-reload
If necessary, mount /data.
\# install mysql (via mariadb fork)
apt install mariadb-server
systemctl stop mariadb
mkdir /data/mysql
chown mysql:mysql /data/mysql
Edit /etc/mysql/mariadb.conf.d/50-server.cnf datadir = /data/mysql innodb_buffer_pool_size = 8G
Add empty database to myql, grant access to ‘zabbix’ with a password.
mysql << DONE
create database zabbix;
grant all privileges on \*.\* to 'zabbix'@'localhost' identified by 'some-password';
DONE
Create a user to run zabbix containers useradd -m -c “Zabbix Monitoring” zabbix
apt install tcpdump mg locate updatedb # (re)build the locate database
# in the container zabbix-server chmod u+s /usr/bin/fping
# Install zabbix-agent Debian package (here and on other Linux servers) apt install zabbix-agent
# then configure /etc/zabbix/zabbix_agentd.conf
Server=172.16.241.0/24
ServerActive=172.16.241.3:10051
Hostname=monitoring.ziply.hamwan.net
This uses the zabbix container addresses.
On any other repeater, this would changes would look likeL
Server=44.25.67.58
ServerActive=44.25.67.58:10051
Hostname=monitoring.ziply.hamwan.net
Server can be an address, CIDR range, or list of either. See the manual for more details.
Disable unnecessary discovery in Mikrotik Template (CapsMAN, LTE)
cd ~zabbix/zabbix-docker docker compose -f psdr.yaml --profile all up -d
docker compose --profile all down
docker logs -f _container_id_or_name_
Updated the Mikrotik by SNMP template to enhance dashboards and turn off some data collection (LTE, CAPSman)
Used configuration from Fremont largely unchanged. There is a key dependency on the Unfiltered.log file where the bulk of HamWAN infrastructure logs. Hacking attempts are fed into fail2ban from here. Configuration of client logging needs review and it had lots of hardcoded 44.24 addresses that were never updated. Need to consider what kind of separation we actually need.
Basically out of the box. Will probably need to update to add authentication.
Used new fail2ban.conf as starting point and merged in HamWAN changes. Added new HamWAN files from Fremont with review.
These provide a redis server and the long poll web servers for new bans. https://github.com/kd7lxl/blacklist-service
You also need to add this to 000-default-ssl.conf for apache: <Location “/blacklist”> ProxyPass “http://127.0.0.1:1234/” </Location>
This also needs the enabling of mods proxy and proxy_http: a2enmod proxy a2enmod proxy_http
Add /srv/www/keys and this stanza to 000-default-ssl.conf for apache:
Upgrades come in 2 pieces
Basic steps:
ICMP ping loss trigger is too sensitive. Needs a longer sample interval I believe.
redis is complaining at startup: WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add ‘vm.overcommit_memory = 1’ to /etc/sysctl.conf and then reboot or run the command ‘sysctl vm.overcommit_memory=1’ for this to take effect.
S2.CapitolPark.hamwan.net MikroTik: Interface wlan1(): Link down Caused because a sector that loses its last client connections will change OperStatus to down instead of dormant. This trigger may need to be revised for APs, or possibly deleted for APs.
zabbix-server-1 | 69:20250205:143203.601 cannot send list of active checks to “172.16.241.1”: host [monitoring] not found. This is a configuration issue I believe.
Figure out how to ensure all the proxmox servers have the seme inventory of install images.
Web GUI shows a valid certificate but complains about active content with certificate errors. Something like this posting: https://community.letsencrypt.org/t/chrome-69-0-3497-81-reports-active-content-with-certificate-errors/71545 Resolution: This was caused by cached javascript content from the prior self-signed certificate. Cleared the cached content and all was fine.
Ping health checks failing. Needed to add setuid bit to fping in the server container.
SNMP agent item “net.if.wireless.walk” on host “r1.capitolpark.hamwan.net” failed: first network error, wait for 15 seconds (and similar) Possibly: https://www.zabbix.com/forum/zabbix-troubleshooting-and-problems/483095-zabbix-7-snmp-timeouts Fixed this by increasing the timeout for SNMP queries (Administration > General > Timeouts). I changed it from 3s to 10s.
3 Mikrotik hosts are refusing to respond to SNMP get/getbulk In this case s2.indianola, r3.baldi and capitolpark.queenanne. This turned out to be a semi-known issue with SNMP and Mikrotik, and asymmetric routes. RouterOS would respond with the address of whichever interface had the best route back to the requester. If that was not the interface that the request was sent to, the requester would be unable to match it with the request it sent. Solution is to give all devices a stable address (on their loopback interface if they have more than one interface), make that the default address in the portal, and set src-address in /snmp to force all SNMP response to come from that address.
Mikrotik templates (from Zabbix and third parties) https://www.zabbix.com/integrations/mikrotik
https://www.zabbix.com/documentation/current/en/manual/discovery/network_discovery